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FOREWORD 


The  need  to  discriminate  and  classify  spatial  patterns  is  central  to  pattern  recognition.  The 
emergence  of  the  field  of  computational  statistics,  made  possible  by  recent  advances  in  digital 
computing,  has  opened  up  a  new  way  of  approaching  pattern  recognition.  In  particular,  it  is  no 
longer  necessary  to  assume  restrictive  statistical  models  for  spatially  correlated  data.  It  thus 
becomes  important  to  develop  model-free  approaches  to  the  description  of  spatial  patterns. 

This  work  was  supported  in  part  by  the  Office  of  Naval  Research  (R&T  No.  4424314)  and  the 
Naval  Surface  Warfare  Center,  In-house  Laboratory  Independent  Research  Program. 

This  report  was  reviewed  by  Dr.  Richard  Lorey,  Head,  Advanced  Computation  Technology 
Group. 
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INTRODUCTION 


Consider  the  problem  of  detecting  and  classifying  homogeneous  regions  in  a  spatially  corre¬ 
lated  signal  such  as  an  image.  One  approach  has  been  to  assume  some  model  describes  the  spatial 
process  and  to  fit  the  model  parameters.  As  an  example,  consider  the  model  for  a  one-dimensional 
simultaneous  autoregressive  (SAR)  process1,2 

h  =  V-t  +  'LKy(Zj-Vj)  +  e,  (!) 

j 

where  z,-  is  signal  value  at  bin  i  with  mean  value  (Xj  and  with  independent,  identically  distributed 
(iid)  noise  £j  (usually  assumed  normal).  The  weights  KtJ  specify  the  neighborhood  and  weighting 
of  the  spatial  dependence.  To  use  this  model,  a  particular  form  of  the  weights  must  be  chosen  and 
a  parametric  form  for  the  iid  noise  is  assumed.  Fitting  the  model  to  the  signal  then  consists  of 
determining  the  parameters  of  the  noise  model.  For  Gaussian  noise,  the  variance  of  the  noise  is 
the  unknown  parameter. 

This  approach  has  the  drawbacks  of  the  need  to  assume  a  parametric  model,  the  limited  num¬ 
ber  of  models  for  which  parameter  estimators  can  be  derived,  and  the  limited  ability  of  the  result¬ 
ant  model  to  accurately  describe  a  wide  range  of  correlated  phenomena.  While  other  models  of 
spatial  correlation  abound,1,2  they  suffer  from  analagous  drawbacks. 

An  alternative  approach  is  to  treat  the  correlated  observations  as  if  they  were  independent  or 
uncorrelated  and  proceed  to  perform  nonparametric  or  semiparametric  density  estimates  under 
this  assumption.  A  good  example  of  this  approach  as  applied  to  fractal  dimension  analysis  is 
given  in  Reference  3,  while  an  application  to  change  point  detection  can  be  found  in  Reference  4. 
This  latter  approach  has  demonstrated  significant  promise  for  the  discrimination  of  different  spa¬ 
tial  processes  even  when  no  simple  parametric  model  such  as  Equation  (1)  adequately  describes 
the  processes. 

When  applying  this  alternative  approach,  it  is  desirable  to  know  explicitly  what  assumptions 
are  being  made,  and  what  the  ramifications  of  those  assumptions  may  be.  It  is  the  purpose  of  this 
report  to  examine  and  make  explicit  the  required  assumptions  as  well  as  discuss  some  of  their 
ramifications. 

In  the  next  section,  a  sequence  of  mappings  is  developed  for  discrimination  of  spatial  pro¬ 
cesses.  This  is  followed  by  a  section  devoted  to  the  relationship  between  these  mappings  and  the 
set  of  probability  density  functions  (pdf’s).  The  final  section  presents  some  concluding  remarks. 
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MULTISET  MAPPINGS  AND  ERGODICITY 


Nonparametric  density  estimation  techniques  abound  for  iid  data.  Examples5  of  nonparamet- 
ric  density  estimators  include  histograms,  frequency  polygons,  average  shifted  histograms,  and 
kernel  estimators.  Semiparametric  density  estimators  are  based  on  the  notion  of  retaining  the  non¬ 
parametric  flexibility  of  the  nonparametric  techniques  while  limiting  the  growth  rate  in  the  model 
complexity.  The  adaptive  mixture  model6  is  a  good  example.  It  is  a  parametric  mixture  model  that 
can  add  terms  (and  hence  complexity)  in  a  data-driven  manner.  It  thus  does  not  have  a  fixed  num¬ 
ber  of  parameters  and  hence  cannot  be  properly  termed  parametric.  Since  its  complexity  grows  at 
a  much  slower  rate  than  that  of  the  nonparametric  techniques,  a  suitable  term  is  semiparametric. 

Thus,  on  the  one  hand  there  is  a  powerful  set  of  density  estimation  tools  that  do  not  make  any 
model  assumptions  but  require  iid  data,  while  on  the  other  hand  the  problem  under  consideration 
involves  spatially  correlated  data  that  is  manifestly  non-iid.  This  presents  a  conundrum  on  how  to 
proceed.  At  first  glance,  the  choices  are  to  either  scrap  the  nonparametric  tools  when  the  data  is 
correlated,  or  to  make  the  false  assumption  that  the  correlations  do  not  exist. 

An  alternative  that  permits  the  use  of  the  nonparametric  tools  while  not  making  the  iid 
assumption  blindly  is  to  develop  a  sequence  of  mappings  from  the  correlated  data  to  data  sets 
where  succeeding  sets  in  the  sequence  (may)  retain  more  and  more  of  the  spatial  information. 

Consider  only  spatial  processes  on  a  lattice.  Let  D  represent  the  index  set  for  the  lattice  points. 
Thus  for  a  two-dimensional  n  x  n  lattice,  D={(i,j)  I  i=l,...,n;  j=l,...,n}.  Then  the  random  process 
can  be  written  as 


{Z(s)\seD}, 


(2) 


with  a  particular  realization  of  this  process  being  denoted  by 

{z(s)  \se  D}  .  (3) 

The  random  process  Equation  (2)  is  usually  defined1  through  the  finite-dimensional  distributions 


=  P{Z(s i)  <zv...,Z(sm)  <ZJ,  m>  1 

This  in  turn  leads  to  the  consideration  of  the  marginal  pdf’s  (provided  they  exist) 


(4) 


(5) 
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as  well  as  joint  pdf’s,  a  two-dimensional  example  of  which  is 


f(Zp  Zj ) 


d  d 
dZjdzt 


P(Z(5.)  <ZpZ(Sj)  <Zj)  . 


(6) 


If  there  are  a  large  number  of  realizations  of  the  same  process,  then  it  is  an  option  to  estimate 
the  density  function  separately  at  each  lattice  site.  If  however,  there  is  only  one  realization  and  the 
interest  is  in  characterizing  the  process  in  terms  of  densities,  assumptions  must  be  made  to  pro¬ 
ceed.  One  logical  assumption  is  that  of  strong  (or  strict)  stationarity  of  the  process,  defined  by  the 
conditions  that 


Fs,  +h,  ...,s  +h  ^1’  ...  c  (^i>  •••’  Zm) 

1  1  7  m  I*  *  in 

for  all  (m  >  1)  and  all  ( Sj  +  h )  e  D.  Under  this  assumption. 


(7) 


f(zt)  =f(Zj )  V  (sp  Sj  e  D)  , 


(8) 


with  similar  identities  holding  for  all  possible  joint  pdf’s. 

Next  suppose  that  it  is  desired  to  characterize  a  strongly  stationary  process  by  its  marginal  pdf 
based  on  a  single  realization  of  the  process.  The  estimate  can  be  based  on  the  set  of  values  from 
this  single  realization.  In  this  case,  the  empirical  cumulative  distribution  function  (ecdf),  based  on 
n  elements  in  D  is 


Fn& 


#{Zi<Z} 

n 


(9) 


For  the  ecdf  to  converge  to  the  true  cumulative  distribution  function  (cdf),  the  observations 
must  be  identically  distributed  (guaranteed  by  the  assumption  of  strong  stationarity)  and  the  num¬ 
ber  of  observations  must  go  to  infinity.  Additionally,  the  usual  requirement  is  that  the  observations 
are  independent  as  well  as  identically  distributed.  Since  the  observations  are  in  general  not  inde¬ 
pendent,  an  alternative  requirement  is  that 


lim  Fn  U)  =  F  (z)  (10) 


where 
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n  — >  oo  <=>  #s .  e  D  — »  » ,  (11) 

and  the  limit  is  the  same  for  almost  every  realization  of  the  process.  This  requirement  is  just  the 
requirement  of  ergodicity. 1 

In  practice,  data  sets  are  always  finite  corresponding  to  a  finite  number  of  elements  in  the 
index  set  D.  Define  the  following  mapping  that  simply  maps  a  particular  realization  of  a  process 
Equation  (2)  to  a  set  of  values  where  in  general  each  element  in  the  new  set  is  indexed  by  the 
number  of  times  it  occurred  in  the  realization.  Formally, 

M0(z(s)  |je  D)  ->y;  (12) 

that  is,  each  (site  indexed)  value  of  the  realization  of  the  process  is  mapped  to  simply  its  value, 
while  the  map  operating  on  the  set  gives 

I se  D})  =  #(z(s)  =  >0  }  •  (13) 


This  last  set  is  formally  a  multiset  as  there  is  an  associated  value  (the  number  index)  with  each 
element.  The  formal  requirement  of  a  multiset  representation  is  only  needed  when  the  same  value 
can  occur  in  the  process  with  nonzero  probability. 

Under  the  assumption  of  ergodicity,  a  pdf  and/or  cdf  may  be  estimated  by  any  of  the  methods 
normally  employed  with  iid  data. 

The  map  definition  can  be  extended  to  yield  multivariate  sets  as  follows.  Let 

Mh  (z  (s) ,  z  (s  +  h)  \s,  s  +  h  6  D)  — >  (y,,  y2) 

ll2(z(s),z(s  +  hi),z(s  +  h2)  \s,s  +  hvs  +  h2e  D )  ->  (yvy2,y3) 

(14) 


Mhl,...,hm(Z(S'>’--’z(s  +  hm)\S’-~’s  +  hmE  D)  OV-.yJ 

so  that,  operating  on  the  set  of  process  values, 
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Mh({z(s)  \se  D})  {  (yvy2)  y  n  =  #(  (z(s)  =  a  (z  (s  +  h)  =  y2) ) } 


r  f  ( z(s )  =y}) 

MKh^{z(s)\s&  D} )  ->  ny  =  #  A  (zis  +  hj  =y2 ) 

l  a  (z(s  +  h2)  =y3) 


thv...,hmJU(s)\seD})^Uyv...,yJ  '  ny  = 


(z(s)  =  y,) 


A  +  =ym) 


This  sequence  of  mappings  can  be  augmented  by  considering  linear  combinations  of  the  ran¬ 
dom  variables  Z(s)  corresponding  to  performing  convolutions  or  weighted  sums.  Let  this  derived 
process  be  denoted  by 

{X(j)|j€D},  (16) 


with  a  particular  realization  of  this  process  being  denoted  by 


{x  (5)  |s  e  D}  . 


Then  all  of  the  mappings  can  be  used  with  x  -4  z .  As  an  additional  alternative,  consider  mixed 
mappings  where  for  example, 

Af0 (z (5) , x (s)  | seD)->  (yv y2) 

M0  (  {  (z  (s) ,  x  (j)  )  \s  e  D} )  ->  |  (yv  y2)  ^  ny  =  # 

This  type  of  mapping  is  suggested  by  the  SAR  process  Equation  (1)  where  a  joint  density  might 
be  of  interest 

/(*(*,•)  >*(**))■  =  (19) 


(zO)  =y  1)  "|1  (18) 

a  (x(j)  =  y2)  J1  ’ 
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PDFs  CORRESPONDING  TO  MULTISET  MAPPINGS 


In  the  limit  as  the  lattice  size  goes  to  infinity,  each  multi-set  mapping  corresponds  to  a  density 
if  the  spatial  process  is  strongly  stationary.  (It  may  or  may  not  correspond  to  a  density  under 
weaker  conditions.)  Under  any  given  multiset  mapping,  a  many-to-one  mapping  can  exist 
between  the  set  of  all  ergodic  spatial  processes  and  the  set  of  pdf’s  (including  the  null  element). 
This  serves  to  produce  a  set  of  equivalence  classes  of  ergodic  spatial  processes  under  each  map¬ 
ping.  This  can  be  seen  pictorially  in  Figure  1,  which  shows  set  relationships  under  a  multiset  map¬ 
ping  Mk  in  the  limit  of  an  infinite  number  of  observations.  Elements  of  the  set  of  spatial  statistics 
undergo  a  many-to-one  mapping  to  a  set  of  equivalence  classes.  Each  equivalence  class  corre¬ 
sponds  to  a  unique  iid  (potentially  multivariate  or  joint)  pdf.  One  of  the  equivalence  classes  corre¬ 
sponds  to  the  non-existance  of  a  pdf. 


FIGURE  1 .  SET  RELATIONSHIPS  UNDER  A  MULTISET  MAPPING  Mk 


Next,  consider  what  happens  when  these  assumptions  are  applied  to  a  process  that  is  not 
ergodic  or  even  stationary.  Asymptotically,  the  pdf  corresponding  to  a  mapping  may  or  may  not 
exist.  Alternatively,  there  is  the  case  where  different  realizations  of  a  spatial  process  (even  in  the 
limit  of  an  infinite  lattice)  converge  to  different  densities.  The  former  case,  where  no  pdf  exists, 
can  be  handled  by  augmenting  the  set  of  pdf’s  with  an  element  corresponding  to  a  null  pdf.  The 
latter  case  amounts  to  the  process  having  a  many-to-many  correspondence  rather  than  the  many- 
to-one  correspondence  shown  in  Figure  1 . 
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Consider  the  following  example  depicted  in  Figure  2.  Figure  2a  shows  two  checkerboard  pat¬ 
terns  with  different  scales.  In  Figure  2b,  the  two  patterns  (in  the  limit  of  infinite  spatial  extent) 
map  to  the  same  pdf  under  M0.  Figure  2c  uses  a  discrete  one-dimensional  representation  of  the 
two-dimensional  joint  densities  resulting  from  the  multiset  map  >0).  The  two  elements  of  the 
ordered  pairs  correspond  to  the  independent  coordinates  of  the  two-dimensional  joint  pdf.  It  has 
been  mapped  to  a  single  axis  here  for  simplicity.  Additionally,  any  binary  pattern  with  equal  num¬ 
bers  of  pixels  with  values  of  one  and  zero  will  fall  into  the  same  equivalence  class  as  the  checker¬ 
board  patterns  under  M0.  Any  patterns  with  unequal  numbers  of  on  and  off  pixels  or  with 
intermediate  values  will  separate  into  different  equivalence  classes  under  the  M0  map. 

Thus  it  is  seen  that  even  though  the  M0  map  discards  all  ordering  information,  a  significant 
amount  of  information  remains  that  can  be  used  via  pdf’s  to  group  spatial  processes  into  equiva¬ 
lence  classes.  If  it  is  desired  to  discriminate  between  two  spatial  processes  that  map  to  the  same 
equivalence  class  under  M0,  it  suffices  to  find  another  element  in  the  multi-set  map  sequence  for 
which  discrimination  will  occur. 

The  preceeding  example  was  chosen  as  a  simple  example  to  illustrate  the  concepts  that  have 
been  presented.  The  more  usual  application  involves  a  continuous  valued  function  at  each  lattice 
point. 

In  this  case,  the  pdf  is  continuous  and  finite  sample  size  becomes  an  issue.  While  all  of  the 
nonparametric  density  estimators  mentioned  earlier  are  designed  for  use  with  continuous  densi¬ 
ties,  all  possess  bias  and  there  will  exist  variance  due  to  finite  sample  size.  In  this  case  an  exact 
match  between  densities  cannot  be  required  because  of  the  error  in  estimating  them.  Rather,  the 
classification  of  the  finite  samples  must  be  based  on  a  measure  of  the  difference  between  density 
estimates.  Examples  of  such  a  measure  include  Lj,  L2,  and  Kullback-Leibler  distance.  Because  of 
bias  and  variance  in  the  density  estimates,  it  is  necessary  to  set  a  threshold  for  grouping  density 
estimates  (and  hence  data  sets)  into  equivalence  classes.  Unfortunately,  an  appropriate  threshold 
is  problem  and  sample  size  dependent  and  will  not  be  addressed  in  this  report. 
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FIGURE  2A.  CHECKERBOARD  PATTERNS  AT  TWO  DIFFERENT  SCALES 


.5  —\ 


.5  —\ 


0 


0 


0 


0  1 


FIGURE  2B.  DISCRETE  PROBABILITY  DENSITY  FUNCTIONS  CORRESPONDING  TO  THE  TWO  CHECK¬ 
ERBOARD  PATTERNS  UNDER  THE  MULTI-SET  MAP  M0 


(0,0)  (1,0)  (0,1)  (1,1)  (0,0)  (1,0)  (0,1)  (1,1) 


FIGURE  2C.  DISCRETE  ONE-DIMENSIONAL  REPRESENTATION  OF  THE  TWO-DIMENSIONAL  JOINT 
DENSITIES  RESULTING  FROM  THE  MULTISET  MAP  M(10) 
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CONCLUSIONS 


An  approach  to  characterizing  spatial  statistics  has  been  presented  that  is  based  on  using  one 
or  more  mappings  that  convert  spatially  correlated  values  to  a  form,  which  allows  the  use  of  non- 
parametric  and  semiparametric  density  estimation  techniques  to  characterize  the  spatial  statistics 
through  the  density  estimate  of  the  mapped  values.  Since  nonparametric  and  semiparametric  den¬ 
sity  estimation  techniques  typically  are  based  on  the  iid  assumption,  they  must  be  used  on  data 
sets  that  are  iid  or  else  on  data  that  behaves  as  if  it  is  iid  asymptotically.  The  mappings  that  have 
been  proposed  in  effect  discard  any  spatial  correlation  information  and  result  in  multisets  com¬ 
posed  of  n-tuples  of  distinct  values  with  associated  indices  that  count  the  number  of  occurances  of 
that  distinct  n-tuple  set  of  values.  These  multisets  have  no  explicit  or  implicit  ordering  of  the  n- 
tuples  and  hence  possess  an  asymptotic  equivalence  with  n-dimensional  joint  pdf’s  if  the  spatial 
process  that  generated  them  is  ergodic. 

Using  the  multiset  mappings  corresponds  to  asking  the  question  of  which  asymptotically 
equivalent  iid  density  does  the  spatial  process  in  question  correspond  to  under  a  particular  multi¬ 
set  map.  For  ergodic  spatial  processes  this  correspondence  is  well  defined.  As  was  seen  in  the 
example,  a  many-to-one  mapping  exists  for  ergodic  spatial  processes  to  an  asymptotically  equiva¬ 
lent  density.  This  serves  to  divide  spatial  processes  into  equivalence  classes  that  will  differ  under 
different  mappings. 

Thus,  this  approach  can  be  viewed  as  one  with  the  intent  to  discriminate  between  particular 
types  of  spatial  processes  (using  nonparametric/semiparametric  tools)  rather  than  to  directly 
describe  the  spatial  processes  in  terms  of  specific  models. 

Finally,  to  place  this  approach  in  perspective,  consider  these  quotes  from  Reference  1. 

“Ergodicity  is  an  assumption  made  to  allow  inference  to  proceed  for  a  series  of 
nonindependent  observations.  It  might  only  be  verifiable  in  the  sense  that  one  fails 
to  reject  it.  This  should  not  be  too  worrisome  because  scientific  discovery  gener¬ 
ally  proceeds  in  this  way.” 


and 


“It  seems  that  statisticians  are  using  only  the  part  of  the  ergodicity  assumption  that 
guarantees  the  sample  mean  and  covariances  converge  to  their  population  counter¬ 
parts.” 

Thus,  this  report  started  with  the  usual  method  of  dealing  with  nonindependent  observations, 
but  rather  than  simply  computing  sample  means  and  covariances,  a  broad  semiparametric/non- 
parametric  approach  to  characterizing  a  broad  class  of  spatial  processes  has  been  formulated.  This 
approach  is  based  on  a  sequence  of  mappings  and  their  resulting  semiparametric/nonparametric 
probability  density  estimates,  which  serve  to  group  spatial  processes  into  sets  of  equivalence 
classes  which  are  a  function  of  the  mapping  used.  While  the  approach  could  have  been  formulated 
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without  explicit  reference  to  the  multiset  mappings,  the  goal  was  to  explicitly  lay  out  as  many  of 
the  assumptions  as  possible  that  have  been  implicit  in  earlier  work. 

In  a  companion  report,  an  example  of  this  approach  involving  continuous  densities  will  be 
provided.  In  future  work,  some  related  issues  such  as  finite  sample  size,  the  role  of  imposed  mea¬ 
sure,  and  the  finite  sample  impact  of  nonergodicity  will  be  examined  in  more  detail. 
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